.
.

Story Memo

This investigation aims to analyze Occupational Safety and Health Administration (OSHA) data in Arkansas to identify which industries are most frequently inspected and subsequently most unsafe.

The focus will be on determining which industries or companies have the most severe violations or complaints per business, with an emphasis on construction and poultry businesses due to the high rates of fatalities and injuries.

The construction and poultry industries are known for having high rates of severe injuries and deaths. This investigation will explore OSHA inspection data to identify patterns in workplace safety violations, particularly in cases where employers have routinely ignored OSHA regulations, leading to dangerous working conditions.

For instance, it is estimated that about half of poultry processing workers are Latino, half are women, and a quarter do not possess legal documents to work in the U.S., according to the National Center for Farmworker Health (NCFH). “Chicken catchers” may be more likely to be male, Latino, and undocumented.

This analysis will include data scraping, data cleaning, graphs, and an interactive visualization to present the findings. The story would have other media components, such as social media videos, a calculator to compare ‘how dangerous is your industry’, explanatory videos about the coverage, audio from workers’ stories, and photos of victims and their families.

The story pitches ahead will be based on data and case studies from the OSHA database. In the future, I hope to incorporate interviews from workers, workers’ advocacy groups, more data from state/federal agencies like BLS or NIOSH, and academic research into common hazards by these industries.

The project would serve to provide a data scraping tool, so that newsrooms in other states can replicate this reporting.


Overview

Reporting Process

Data Scraping:

  • Libraries httr and rvest are used to scrape data from the OSHA website.
  • Headers are defined to bypass authentication errors and simulate a human request.
  • The URL is modified to display all results on a single page for easier scraping.

Data Cleaning:

  • The scraped data is parsed and cleaned using the janitor package.
  • Relevant columns are selected, and NAICS codes are used to join OSHA data with industry descriptions.

Data Analysis:

  • Inspection counts are calculated per industry.
  • Fatality and catastrophic (Fat/Cat) incidents are filtered and analyzed by industry.
  • The data is visualized using an interactive Flourish chart.

Code Explanation

Set Up/Libraries

#Loading the libraries
library(httr)
library(rvest)
library(tidyverse)
library(janitor)
library(readxl)
library(lubridate)

Scraping OSHA Data

First, I tried scraped the OSHA website, but quickly I encountered issues I then I loaded the headers to bypass the ‘403’ authentication error when loading the osha.gov website. The header was written to convince the website that I was not a bot, but a person submitting a request and looking for the data.

By having the header defined, a 403 code no longer appeared, but a 304 request did. I then had to modify my header by changing the value of the cookie that the site was requesting, I just changed the value to equal ‘1’. Also I then defined the url with the OSHA query using two variables, State=Arkansas, Office=Little Rock, and I also was looking for Fed/State data, dating from 2024 back ten years worth of data.

# Define the headers
headers <- c(
  "Accept" = "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,/;q=0.8,application/signed-exchange;v=b3;q=0.7",
  "Accept-Encoding" = "gzip, deflate, br, zstd",
  "Accept-Language" = "en-US,en;q=0.9",
  "Cache-Control" = "max-age=0",
  "Cookie" = "_gid=1",
  "If-Modified-Since" = "Mon, 03 Jun 2024 13:39:50 GMT",
  "If-None-Match" = "\"1717421990\"",
  "Priority" = "u=0, i",
  "Sec-Ch-Ua" = "\"Google Chrome\";v=\"125\", \"Chromium\";v=\"125\", \"Not.A/Brand\";v=\"24\"",
  "Sec-Ch-Ua-Mobile" = "?0",
  "Sec-Ch-Ua-Platform" = "\"macOS\"",
  "Sec-Fetch-Dest" = "document",
  "Sec-Fetch-Mode" = "navigate",
  "Sec-Fetch-Site" = "none",
  "Sec-Fetch-User" = "?1",
  "Upgrade-Insecure-Requests" = "1",
  "User-Agent" = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36"
)

# The url only displayed the first 20 results, but by modifying the url to change the violations of the page results to show ‘3000’ instead, I was able to access it.

# Define the URL
url <- "https://www.osha.gov/ords/imis/establishment.search?establishment=&state=AR&officetype=all&office=627100&sitezip=100000&startmonth=01&startday=01&startyear=2014&endmonth=06&endday=05&endyear=2024&p_case=all&p_violations_exist=both&p_start=&p_finish=20&p_sort=12&p_desc=DESC&p_direction=Next&p_show=3500"

# Make the GET request with headers
response <- GET(url, add_headers(.headers = headers))

# Check the status code
if (status_code(response) == 200) {
  # If the request is successful, parse the HTML content
  content <- content(response, as = "text")
  webpage <- read_html(content)
  
  # Extract all tables
  tables <- webpage %>%
    html_nodes("table") %>%
    html_table(fill = TRUE)
  
  # Then, because the scraper was only looking for the first table, I had to change the code to find the second table that had all of the relevant data, after inspecting the html code. 
  
  # Check if there are at least two tables
  if (length(tables) >= 2) {
    # Print the second table
    cleaned_table <- tables[[2]]
    print(tables[[2]])
  } else {
    print("Less than two tables found on the page.")
  }
} else {
  print("Failed to fetch the page.")
}

#fix names:
cleaned_table <- cleaned_table %>% 
  clean_names()

write.csv(cleaned_table,"osha_table.csv")

Data Cleaning and Analysis

# Load the OSHA data
cleaned_table <- read.csv("osha_table.csv")

# Select relevant columns and clean names
relevant_osha <- cleaned_table %>% 
  select(date_opened, naics, establishment_name, type) %>% 
  mutate(naics1 = as.character(naics))

# Load the NAICS data
naics <- read_excel("/Users/rachellsanchez/Desktop/DJNF_Merrill/OSHA Project/NAICS_codes.xlsx") %>% 
  clean_names() %>% 
  rename(naics1 = x2022_naics_us_code, industry = x2022_naics_us_title)

# Join the OSHA data with the NAICS data
joined_table <- relevant_osha %>%
  inner_join(naics, by = c("naics1" = "naics1"))

# Find the count of inspections per industry
inspection_counts <- joined_table %>% 
  count(industry) %>% 
  arrange(desc(n))

Analyzing Fatalities and Severe Incidents

# Filter for Fat/Cat incidents
fatalities_industry <- relevant_osha %>%
  filter(type == 'Fat/Cat') %>%
  select(establishment_name, type, naics1)

# Join with NAICS data to get industry names
fatalities_industry <- fatalities_industry %>%
  inner_join(naics, by = "naics1")

# Count Fat/Cat incidents per industry
fatcat_counts <- fatalities_industry %>% 
  count(industry) %>% 
  arrange(desc(n))

print(fatcat_counts)

# Save the Fat/Cat data to a CSV
write.csv(fatalities_industry, "fatalities_industry.csv")

Analysis of Specific Industries

# Poultry Processing Fat/Cat incidents by year

joined_table %>% 
  mutate(date1=mdy(date_opened)) %>% 
  mutate(year=year(date1)) %>% 
  filter(industry=="Poultry Processing") %>% 
  filter(type=="Fat/Cat") %>% 
  group_by(year) %>% 
  count(year)
# Electrical Contractors Fat/Cat incidents by year

joined_table %>% 
  mutate(date1=mdy(date_opened)) %>% 
  mutate(year=year(date1)) %>% 
  filter(industry=="Electrical Contractors and Other Wiring Installation Contractors") %>% 
  filter(type=="Fat/Cat") %>% 
  group_by(year) %>% 
  count(year)
# Highway, Street, and Bridge Construction Fat/Cat incidents by year

joined_table %>% 
  mutate(date1=mdy(date_opened)) %>% 
  mutate(year=year(date1)) %>% 
  filter(industry=="Highway, Street, and Bridge Construction") %>% 
  filter(type=="Fat/Cat") %>% 
  group_by(year) %>% 
  count(year)

Case Studies

Mdr Construction Inc, paid $135,000.00 to OSHA last year.
Mdr Construction Inc, paid $135,000.00 to OSHA last year.
Julian Contracting, Inc, paid $22,584 to OSHA for their employees death. He was 24.
Julian Contracting, Inc, paid $22,584 to OSHA for their employees death. He was 24.
Worker was six months pregnant and had third degree burns on her face.She was airlifted to the closest hospital.
Worker was six months pregnant and had third degree burns on her face.She was airlifted to the closest hospital.

Key Findings

Despite complete OSHA guidelines being established more than two decades ago, workplaces and industries are still failing to protect workers from severe injuries, accidents, hospitalizations, and even death. Almost all of the deaths seen in Arkansas were due to negligent behaviors and failed to be reported to the federal agency in due time.

Cross-referencing this list with the BLS for the last available data in 2022, the majority of fatal injuries were men representing 72 of the 75 deaths. The biggest age group was 35 to 44, though all age groups beginning with 25 to 65 were represented. Most were waged and salaried workers, representing 90% of the fatal injuries versus the 7 deaths who were self employed.

This project serves to expose the industries that are most likely to endanger workers due to pattern of violations and laxness around vital safety standards and regulations.

The analysis reveals the frequency and severity of workplace safety violations in Arkansas, a method that can be applied towards any state to identify and rank the most dangerous industries nationwide.

Strengths

  • Publicly Accessible - OSHA is accessible to everyone, and can be replicated for every region/state. The data includes information on inspections, violations, penalties, and compliance activities, which can be useful for tracking enforcement efforts and compliance trends.
  • Widespread Coverage - By using inspections rather than a variable like complaints or even violations, the OSHA data is broad enough to look across almost all industries and the reports are relatively standardized.
  • Historical Data - The database spans a decent length, and it can be evaluated in 10 and even 20 year intervals.
  • Detailed Incident Reporting - There are detailed findings if the complaint is complete and a violation is issued and not contested. Even information over the penalties and associated fines are provided in the database, which would lead to interesting stories.
  • Data Timeliness - While complaints/investigations in the most recent month may not always be finished, OSHA updates their data every month leading to near real-time reporting.

Weaknesses

  • Data Gaps - There is a lack of data for certain years in specific industries, leading to incomplete looking graphs. Certain sectors, such as small businesses or self-employed workers, may be underrepresented in the data, leading to gaps in coverage. It would be more beneficial to scrape ‘serious’ OSHA violations and analyze those figures instead.
  • Underreporting - There is a potential for underreporting of workplace injuries and illnesses, especially in industries or companies where there is fear of retaliation or lack of awareness/accessibility reporting requirements.
  • Coding Nightmare - Data was difficult to scrape because of OSHA’s permissions and cookies on its websites, but once the header and correct url are provided in the code, it is available. It’s even harder to pull the full report with details, but not impossible, the process and code would just take additional time to complete.
  • Limited Context- OSHA data may not always provide the full context of incidents, such as contributing factors outside the workplace and the COVID-19 pandemic.
  • Further Analysis Needed - This data should also be looked at over time to see if certain industries have improved or declined over time, but because of a lack of time those findings were not provided.

Potential Stories

  • These companies are most investigated and inspected by OSHA in Arkansas. Here are the most dangerous ones.

  • These industries have the most deaths in Arkansas.

  • Workers are falling in Arkansas, no one is protecting them How much money in penalties have companies paid for because they are endangering workers?

  • How do Arkansas industries and injuries compares to other states?

In the future, I would like to collect data from other states, normalize it to have a comparable sample size using worker population, much like ArkansasCovid.com did, to be able to compare it with other states. Plus use other databases like BLS to confirm data and provide more in depth demogrpahic information.


Contact Information

Rachell Sanchez-Smith | | (479) 935-0882